For this example exercise, we will use a dataset with stocks from the NYSE, which is mixed with some data from US airlines.
We will look for some correlations, some of which will make sense, and some of which won't.
Note: Please note that this isn't necessarily real stock data, it is a mix of real and synthetic data for teaching purposes. If you want to play around with real stock data there are lots of sources, google and yahoo finance are particularly use friendly.
In [34]:
# import Pandas
import pandas as pd
# making plots inline
% matplotlib inline
In [35]:
# read the data (if you have any questions about these relative paths and index_col, please feel free to discuss)
data = pd.read_csv('../data/stocks_and_airliners.csv', index_col='date')
In [36]:
# quick preview
data.head()
Out[36]:
In [37]:
# what data do we have here?
data.columns
Out[37]:
Let's get an example of pearson correlation, and one of spearman correlation
In [39]:
# What is the pearson correlation between Accenture stock prices and Adobe stock prices?
data['Accenture plc'].corr(data['Adobe Systems Inc'], method='pearson')
Out[39]:
In [40]:
data[['Accenture plc', 'Adobe Systems Inc']].plot(figsize=(16, 5))
Out[40]:
In [46]:
# what is the spearman correlation between American Airlines Group stock price and the number of domestic passengers?
data['American Airlines Group'].corr(data['Passengers_Domestic'], method='spearman')
Out[46]: